-
Notifications
You must be signed in to change notification settings - Fork 199
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cascaded / Unary union #1246
base: main
Are you sure you want to change the base?
Cascaded / Unary union #1246
Conversation
Did you ask for some polygons? They're not WKT (I can help with the conversion to if needed), but here's a bunch of them I just happened to have on hand: https://data.europa.eu/data/datasets/d18f6f95-1d96-4b86-a4ba-291a416e45bb?locale=en. They're a mix of polygons and multipolygons, but I'm not sure why that would be a problem. |
Some preliminary perf data: a 73.8 mb WKT file, containing 39490 polygons (extract from large-scale agricultural data, thanks @lnicola!) can be unioned into a let unioned = data
.iter()
.fold(MultiPolygon::<f64>::new(vec![]), |acc, poly| {
acc.union(poly)
}); hadn't terminated after 18 minutes, so this is at least a 100x improvement (?). Next I need to verify that the algorithm produces valid output (no reason to think it doesn't) cc @dabreegster please take this for a spin with more real-world data when you get a chance! |
46d5a17
to
885f428
Compare
Since we can I'd also be happy to write a second impl for impl<T, Container> UnaryUnion for Container
where
T: BoolOpsNum,
Container: IntoIterator<Item = MultiPolygon<T>> + Clone,
MultiPolygon<T>: RTreeObject,
{ ... } |
I'm seeing great results from INSPIRE parcels. https://www.dropbox.com/scl/fi/8w9ufilq2c7wq1vp4fei0/London_Borough_of_Southwark.zip?rlkey=ejcug1i5fnkf35d2xjwcogz9m&dl=0 Using Mapshaper output, darker area is features overlapping originally that weren't dissolved: There's more, larger INSPIRE inputs. I'm on battery power right now, so not going to push it :) But a larger run in Birmingham on 60k features ran in about 30s (I didn't time it exactly) and produced 17k outputs. I have even larger OS data I could try, but can't share the inputs. It's particularly impressive/useful that the base boolean ops work on both projected and un-projected inputs! No feedback on this API; it seems fine to me. I have a more niche use case of limiting the maximum area of unioned polygons, so I'll probably adapt what's here to do that. I don't think baking in support for that level of customization makes sense. |
As I understand it, the math is all fundamentally euclidean, so you're playing with fire if you're doing these calculations without first projecting to the euclidean plane. Fire can be kind of fun though. |
speaking of a multithreading flag - this feature might be a good candidate for rayon. (probably not in this PR, but it'd be interesting to see) |
It might be good to have a separate discussion thread about rayon API design. In particular, I'd love it if any rayon integration could use a rayon thread pool provided by the caller, so that higher level apps that already use rayon don't have work split across multiple rayon pools. |
Could you implement it in terms of Something like: - Container: IntoIterator<Item = Polygon<T>> + Clone,
+ Container: IntoIterator<Item = impl BooleanOps<Scalar = T>> + Clone, This should work for both |
You can't use I think we will actually need separate impls even if I can figure out the above because I'm struggling to imagine how we'd specialise the it's currently let fold = |mut accum: MultiPolygon<T>, poly: &Polygon<T>| -> MultiPolygon<T> {
accum = accum.union(poly);
accum
}; but for MultiPolygon it would have to be let fold = |mut accum: MultiPolygon<T>, mpoly: &MultiPolygon<T>| -> MultiPolygon<T> {
accum = accum.union(mpoly);
accum
}; |
I wonder if this would work if Polygon and MultiPolygon were enum members and BooleanOps was impl'd on the enum (followed by UnaryUnion). |
899c782
to
79290f4
Compare
No point in tying myself in knots playing type Tetris:
I think this might be tricky to implement either on your side or in |
For follow-on work, a preliminary parallel version was proposed in georust/rstar#80 (comment): fn parallel_bottom_up_fold_reduce<T, S, I, F, R>(tree: &RTree<T>, init: I, fold: F, reduce: R) -> S
where
T: RTreeObject,
RTreeNode<T>: Send + Sync,
S: Send,
I: Fn() -> S + Send + Sync,
F: Fn(S, &T) -> S + Send + Sync,
R: Fn(S, S) -> S + Send + Sync,
{
fn inner<T, S, I, F, R>(parent: &ParentNode<T>, init: &I, fold: &F, reduce: &R) -> S
where
T: RTreeObject,
RTreeNode<T>: Send + Sync,
S: Send,
I: Fn() -> S + Send + Sync,
F: Fn(S, &T) -> S + Send + Sync,
R: Fn(S, S) -> S + Send + Sync,
{
parent
.children()
.into_par_iter()
.fold(init, |accum, child| match child {
RTreeNode::Leaf(value) => fold(accum, value),
RTreeNode::Parent(parent) => {
let value = inner(parent, init, fold, reduce);
reduce(accum, value)
}
})
.reduce(init, reduce)
}
inner(tree.root(), &init, &fold, &reduce)
} |
The type tetris isn't too bad. But concerningly the test doesn't pass for multipolygon. I'm not sure if this represents a bug in BooleanOps or in UnaryUnion (or something else): 39a2f62 I'm OK without merging support for MultiPolygons at present, but wanted to highlight this behavior in case it's meaningful to you. |
Oh nice job, I burned a couple of hours on these today. The test result should be the same AFAIK. What's the actual output geometry?? |
One thing I wonder about the performance is whether the [ |
The correct output should be: Polygon { exterior:
LineString([
Coord { x: 200.38308698623, y: 288.3387931645567 },
Coord { x: 204.07584924189865, y: 288.2701221168692 },
Coord { x: 210.99999999939024, y: 291.99999999857494 },
Coord { x: 209.99999999939024, y: 289.99999999857494 },
Coord { x: 212.24082540659725, y: 285.47846008694717 },
Coord { x: 205.630617245422, y: 287.73853614038774 },
Coord { x: 204.00000000684082, y: 287.0000000060255 },
Coord { x: 200.38308698623, y: 288.3387931645567 }]), interiors: [] }] But what we get is: [Polygon { exterior:
LineString([
Coord { x: 200.38308698977124, y: 288.3387931571061 },
Coord { x: 204.07584924543988, y: 288.2701221094186 },
Coord { x: 205.63061724896323, y: 287.73853613293716 },
Coord { x: 204.00000001038205, y: 287.0000000060255 },
Coord { x: 200.38308698977124, y: 288.3387931571061 }]), interiors: [] },
Polygon { exterior: LineString([
Coord { x: 204.0758492379893, y: 288.2701221168692 },
Coord { x: 210.9999999954809, y: 291.99999999857494 },
Coord { x: 209.9999999954809, y: 289.99999999857494 },
Coord { x: 212.24082541013848, y: 285.47846008694717 },
Coord { x: 212.24082539523732, y: 285.47846009439775 },
Coord { x: 212.2408254026879, y: 285.47846008694717 },
Coord { x: 205.63061724896323, y: 287.73853613293716 },
Coord { x: 205.6306172564138, y: 287.73853614038774 },
Coord { x: 204.07584926034104, y: 288.2701221094186 },
Coord { x: 204.07584924543988, y: 288.2701221094186 },
Coord { x: 204.0758492379893, y: 288.2701221168692 }]), interiors: [] }
]; That's not any kind of correct. Hm. |
I think this is a bug in the Using the test polygons, you should get the same result if you do: let empty = MultiPolygon::<f64>::new(vec![]);
// multi_poly_1 is [poly1, poly2]
let empty_union = empty.union(&multi_poly_1);
// and
let mp1 = MultiPolygon::new(vec![poly1.clone()]);
let mp2 = MultiPolygon::new(vec![poly2.clone()]);
let mpu = &mp1.union(&mp2); …but you don't. That's a bug unless I'm missing a detail about how |
I was able to get the tests passing by making the multipolygon valid: 5900563 Note the usage of |
This will pass when #1254 merges.
Urgh I should have checked validity before anything else. Sorry. |
…e reborrow We now reborrow, but we could also deref: accum = accum.union(&**poly.0);
So we know what the hell we were doing if we ever have to revisit this
9f1e643
to
83e06bd
Compare
CHANGES.md
if knowledge of this change could be valuable to users.This is a draft since it currently has unknown perf characteristics, and there may be better ways of defining the trait implementers